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Convolution and deconvolution based estimates of galaxy 
scaling relations from photometric redshift surveys 
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ABSTRACT 

In addition to the maximum likelihood approach, there are two other methods which 
are commonly used to reconstruct the true redshift distribution from photometric 
redshift datasets: one uses a deconvolution method, and the other a convolution. We 
show how these two techniques are related, and how this relationship can be extended 
to include the study of galaxy scaling relations in photometric datasets. We then 
show what additional information photometric redshift algorithms must output so 
that they too can be used to study galaxy scaling relations, rather than just redshift 
distributions. We also argue that the convolution based approach may permit a more 
efficient selection of the objects for which calibration spectra are required. 



Key words: methods: analytical, statistical - galaxies: formation 
observations. 



1 INTRODUCTION 

The next generation of sky surveys will provide reasonably 
accurate photometric redshift estimates, so there is consid- 
erable interest in the development of techniques which can 
use these noisy distance estimates to provide unbiased esti- 
mates of galaxy scaling relations. While there exist a num- 
ber of methods for estimating photometric redshifts (Bu- 
davari 2009 and references therein), there are fewer for us- 
ing these to estimate accurate redshift distributions (Pad- 
manabhan et al. 2005; Sheth 2007; Lima et al. 2008; Cunha 
et al. 2009), the luminosity function (Sheth 2007), or the 
joint luminosity-size, color-magnitude, etc. relations (Rossi 
& Sheth 2008; Christlein et al. 2009; Rossi et al. 2010). 

Ideally, the output from a photometric redshift esti- 
mator is a normalized likelihood function which gives the 
probability that the true redshift is z given the observed 
colors (i.e. Bolzonella et al. 2000; Collister & Lahav 2004; 
Cunha et al. 2009). Let £(z|c) denote this quantity; it may 
be skewed, bimodal, or more generally it may assume any 
arbitrary shape. 

Let £ denote the mean or the most probable value of 
this distribution (it does not matter which, although some 
of the logic which follows is more transparent if £ denotes 
the mean). Often, £ (sometimes with an estimate of the un- 
certainty on its value) is the only quantity which is available. 
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cosmology: 



Therefore, in Section 12.11 we first consider how £ compares 
with the true redshift z, and contrast the convolution and 
deconvolution methods for estimating dN/dz - while in Sec- 
tion l2.2l we describe how to reconstruct the redshift distribu- 
tion directly from colors. Section ^. 3l shows what this implies 
if one wishes to use the full distribution £(z|c). Section \2. 41 
shows how to extend the logic to the luminosity function, 
and Section 12.51 to scaling relations, again by contrasting 
the convolution and deconvolution methods, and showing 
what generalization of C(z\c) is required from the photo- 
metric redshift codes if one wishes to do this. A final section 
summarizes our results. 

Where necessary, we write the Hubble constant as Ho = 
lOO/i km s _1 Mpc -1 , and we assume a spatially flat cosmo- 
logical model with (SIm ,Qa, h) = (0.3,0.7,0.7), where Qm 
and Qa are the present-day densities of matter and cosmo- 
logical constant scaled to the critical density. 



2 TO CONVOLVE OR DECONVOLVE? 

In what follows, we will use spectroscopic and photometric 
redshifts from the SDSS to illustrate some of our arguments. 
Details of how the early-type galaxy sample was selected are 
in Rossi et al. (2010); the photo-zs for this sample are from 
Csabai et al. (2003). 
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Figure 1. Distribution of the difference between spectroscopic and photometric redshifts (z and £), at fixed z (top) and £ (bottom), in 
the SDSS early-type galaxy sample. Note that p((\z) is rather well centered on z, whereas p(z <,") is not centered on 



2.1 The redshift distribution 

Suppose that the true redshifts z are available for a subset 
of the objects; for now, assume that the subset is a random 
subsample of the objects in a magnitude limited catalog. 
Ideally, this subset would have the same geometry as the 
full survey, as cross-correlating the objects with spectra and 
those without allows the use of other methods (e.g. Caler 
et al. 2009). In practice, this may be difficult to achieve - 
and this is not required for the analysis which follows, pro- 
vided that the photometric redshift estimator does not have 
spatially dependent biases (e.g., as a result of photometric 
calibrations varying across the survey). 

For the objects with spectroscopic redshifts, one can 
study the joint distribution of £ and z (see Figure Q]). Typi- 
cally, most photometric redshift codes are constructed to re- 
turn (Cl-z) ~ z. The codes which do so are sometimes said to 
be unbiased, but they are not perfect: the scatter around the 
unbiased mean is of order gq z ~ 0.05 (1 + z). This scatter, 
combined with the fact that {(\z) ~ z means that (z\C/) 7^ C : 
the fact that {z\Q is guaranteed to be biased is not widely 
appreciated. However, we show below that it matters little 
whether (C\z) or {z\Q are unbiased - what matters is that 
the bias is accurately quantified. 

In particular, if dTV /d£ and dN/dz denote the distribu- 
tion of £ and z values in the subset of the data where both 
z and £ are available, then what matters is that p((\z) and 
p(z\(/), where 
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dzd£ dz 
are known. Note that 
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The algorithm in Sheth (2007) assumes that p((\z), mea- 
sured in the subset for which both z and £ are available, 
also applies to the full sample for which z is not available. 
Since cL/V /d( is measured in the full dataset, and p(C,\z) is 
known, a deconvolution is then used to estimate the true 
dN/dz. 

Suppose, however, that one measured p(z|C) instead. 
Then, because 

dN(z) 



(3) 



dz J ' dC, 

one could estimate the quantity on the left hand side by 
'convolving' the two measurables on the right hand side. 
For the data-subset in which both z and £ are available, this 
is correct by definition. Clearly, to use this method on the 
larger dataset for which only ( is available, one must assume 
that p(z|C) in the subset from which it was measured remains 
accurate in the larger dataset. 

Rossi et al. (2010) have shown that the deconvolution 
method accurately reconstructs the true dN/dz distribution 
from dj\f /d£. Figure [2] shows that the convolution approach 
also works well, even when only a random 5% of the full 
dataset is used to calibrate p(z\Q - as displayed in Figure 
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Figure 2. Distribution of dA" /d£ (dotted) and dN/dz (solid); crosses show the result of convolving dA" /d£ with p(z\Q (from the bottom 
panel of Figure [IJ . 



[T] Thus, for the dataset in which both z and £ are avail- 
able, both the convolution and deconvolution approaches 
are valid, whether or not the means (or, for that matter, 
the most probable values) of p(z|C) an d p(C\z) are unbiased, 
and however complicated (skewed, multimodal) the shape 
of these two distributions. This remains true in the larger 
dataset where only £ is known. However, whereas the con- 
volution approach assumes that p(z|C) is the same in the 
calibration subset as in the full one, the deconvolution ap- 
proach assumes that p{(,\z) is the same. 

2.2 Convolution directly from colors 

The integral in equation <(3j is really a sum over all the 
objects in the photometric dataset, where each object with 
estimated £ contributes to dN/Az with weight p(z\Q: 

^ s / d( ^p pW() = 5>l(i , «, 

i 

Now, recall that £ was the mean (or most probable) value of 
a distribution returned by a photometric redshift code. In 
cases where the observed colours c map to a unique value 



of £, then this sum over £ is really a sum over c, and the 
expression above is really 

^/dc^(*|c)=£K^). (5) 
-' « 

Equation ([5j is one of the key results of this paper. 

Although we arrived at equation ([5]) by requiring the 
mapping c — !> £ be one-to-one (as may be the case for, e.g., 
LRGs), it is actually more general. This is because one can 
simply measure p(z\c) in the sample for which spectra are 
in hand, for the same reason that one could measure p(z\C,). 
In fact, p(z\c) is an easier measurement, since it does not 
depend on the output of a photo- z code! The constraint on 
the mapping between c and £ in the discussion above was 
simply to motivate the connection between photo-z codes 
and the convolution method. Once the connection has been 
made, however, there is no real reason to go through the 
intermediate step of estimating since all photo-z codes use 
the observed colors c anyway. In this respect, equation ((5]l is 
the more direct and natural expression to work with than is 
equation Q. In particular, because p(z\c) is an observable, 
the convolution approach of equation ([5]) is independent of 
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Figure 3. Similar to Figure [T] but for the true absolute magnitude and the estimate from the photometry. Notice that p(A4\M) is 
approximately symmetrically distributed around M, whereas p(M\M) can be both significantly offset from M and skewed. 



any photo- z algorithm. Of course, if this method is to work, 
then the subsample with spectral information must be able 
to provide an accurate estimate of p(z\c). 

2.3 Relation to photo-z algorithms 

The convolution method of the previous subsection provides 
a simple way of illustrating how one should use the output 
from photo-z codes that actually provide a properly cali- 
brated probability distribution C(z\c) for each set of colors 
c, to estimate dN/dz. It also shows in what sense the codes 
should be 'unbiased'. 

In particular, equation ([5]) suggests that one can es- 
timate dN(z)/dz by summing over all the objects in the 
dataset, weighting each by its C(z\c). This is because 

J2£(z\c i ) = ^- if £(z\c)=p(z\c). (6) 

i 

Equation Q shows that if £(z\c) does not have the same 
shape as p(z\c), then use of C(z\c) will lead to a bias; this 
is the pernicious bias which must be reduced - whether or 
not (z\c) equals the spectroscopic redshift is, in some sense, 
irrelevant. (In the case of a one-to-one mapping between 



c and C, (z\c) is the same as the quantity which we 

discussed in the previous subsections.) 

Satisfying C(z\c) = p(z\c) is nontrivial. This is perhaps 
most easily seen by supposing that the template or training 
set consists of two galaxy types (early- and late- types, say), 
for which the same observed colors are associated with two 
different redshifts. In this case, if the photo-z algorithms 
are working well, then C{z\c) will be bimodal for at least 
some c. However, if the sample of interest only contains 
LRGs, then p(z\c) may actually be unimodal. As a result, 
£(z|c) ^ p(z\c) unless proper priors on the templates are 
used, or care has been taken to insure that the training set 
is representative of the sample of interest. 



2.4 The luminosity function 

We can perform a similar analysis of the luminosity func- 
tion. In this case, the key is to recognize that, in a mag- 
nitude limited survey, the quantity which is most directly 
affected by the photometric redshift error is not the lumi- 
nosity function <j>(M) itself, but the luminosity distribution 
N(M) = Vmax(M) 4>(M) (Sheth 2007). In a spectroscopic 
survey, N(M) differs from 4>(M) because one sees the bright- 
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Figure 4. Same as Figure [2] but for the absolute magnitudes. Crosses show the distribution one obtains by convolving the dotted 
histogram with the distributions shown in the bottom panel of Figure [3] solid histogram shows the true distribution of M. 



est objects to larger distances: V ulax (M) is the largest co- 
moving volume to which an object with absolute magnitude 
M could be seen. If we use M to denote the absolute mag- 
nitude estimated using the photometric redshift £, and M 
its correct value, then 



N{M) 



dM N(M)p(M\M). 



(7) 



Sheth (2007) describes a deconvolution algorithm for esti- 
mating N(M) given measurements of Af(M) and the as- 
sumption that p(M\M), measured in a subset for which 
both z and £ (hence both M and M) are available, also 
applies to the full photometric survey. 

Following the discussion in the previous section, we 
could instead have measured p{M\M), and then used the 
fact that 



N(M) = / &MN{M)p{M\M) 



(8) 



to estimate the quantity on the left hand side by summing 
over the photometric catalog on the right hand side, weight- 
ing each object in it by p(M\M); note that this weight de- 



pends on M. Figure [3] shows p(M.\M) and p(M\M); notice 
how broad they are, and how much more skewed and biased 
p{M\M) is than p(M\M). Nevertheless, Rossi et al. (2010) 
have shown that the deconvolution algorithm produces good 
results. Figure [4] shows that the convolution algorithm does 
as well. 

One estimates <j>(M) by dividing N(M) by Vmax(M). 
Since this weight is the same for all objects with the same 
M, one could have added an additional weighting term to 
the sum above to get 



0(M) = 



dMAT(M) 



p(M\M) 
V max (M) 



(9) 



One might have written <j>(M) = N(M)/Vmax.{M), so the 
expression above shows explicitly why the photometric er- 
rors should be thought of as affecting N(M) and not (j)(M). 

To make the connection to p{z\c) and then C(z\c) it is 
worth considering how one computes M from z given the 
observed colors c. If there were no fc-correction, then the 
luminosity in a given band would be determined from the 
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observed apparent brightness by the square of the (cosmol- 
ogy dependent) luminosity distance - the colors are not nec- 
essary. In practice however, one must apply a fc-correction; 
this depends on the spectral type of the galaxy, and hence 
on its color. As a result, the mapping between m and M 
depends on z and c. But it is still true that both M and z 
are determined by c. Therefore, the spectroscopic subsample 
which was previously used to estimate p(z\c) also allows one 
to estimate p(M, z\c). The quantity of interest in the previ- 
ous section, p(z\c), is simply the integral of p(M,z\c) over 
all M. The quantity of interest here, p(M\c), is the integral 
of p(M, z\c) over all z. Thus, equation (JHJ becomes 



N{M) 



dc pUV(c) i dzp(M ^| c) 



dc 



Ac 

dJV(c) 
dc 



p(M\c) = 2^p(M\a) 



(10) 



where the second to last expression writes the integral of 
p(M,z\c) over all z as p(M|c), and the final one writes the 
integral explicitly as a sum over the objects in the catalog. 

The expression above is the convolution-type estimate 
of N(M); it does not require a photometric redshift code. 
However, in principle, a photometric redshift code could out- 
put £(M,z\c): the quantity such codes currently output, 
C(z\c), is the integral of £(M, z\c) over all M. The relevant 
weighted sum becomes 



N{M) =^C{M\a), 



(11) 



where C(M\c) is the integral of £(M, z\c) over all z, the sum 
is over all the objects in the catalog, and the method only 
works if £(M|c) = p{M\c). 

Note that the luminosity density (in solar units) can, 
therefore, be written as 



AMcj>(M) w -o-HM-M e) 

1O -O.4(M-M ) 



AMN(M) ■ 



V max (M) 



AMM{M) / AMp(M\M) 



1O -O.4(M-M ) 

MM) 



AMM{M) 



1O -O.4(M-M ) 



V max (M) 



M 



/ 1O -O.4(M-M ) 
^ \ Vmax(M) 



(12) 



The second to last line shows that one requires the average 
of (I//V m ax(i)) summed over the distribution p{M\M); this 
is easily computed from distributions like those shown in the 
bottom panel of Figure [3] The final expression writes this 
as a sum over the observed distribution of colors. 



2.5 Galaxy scaling relations 

Although the previous section considered the luminosity 
function in a single band, it is clear that the photometric 
redshift codes could output £(M,z\c), where M is a set 
of absolute luminosities (typically, these will be those asso- 
ciated with the various band passes from which the colors 



c were determined). Hence, the color magnitude relation, 
which is really a statement about the joint distribution in 
two bands, can be estimated by 

N(M) = J dc^^ J Azp(M,z\c) 

= y"dc^^p(M| C ) = ^p(M| Cl ). (13) 

i 

Galaxy scaling relations can be estimated similarly, if we 
simply interpret M as being the vector of observables which 
can include sizes, etc. (not just luminosities). In principle, 
quantities other than colors (e.g., apparent magnitudes, sur- 
face brightness, axis ratios) can play a role in the photomet- 
ric redshift determination; this can be incorporated into the 
formalism simply by using c to now denote the full set of 
observables from which the redshift and other intrisic quan- 
tities M were estimated. 

If one wishes to use the output from a photo-z code, 
rather than from the spectroscopic subset, one would use 



A(M) = ^£(M| Cl 



(14) 



having checked that, in the spectroscopic subset, £(M\ci 
P (M\a). 



3 DISCUSSION 

We showed how previous work on deconvolution algorithms 
for making unbiased reconstructions of galaxy distributions 
and scaling relations (Sheth 2007; Rossi & Sheth 2008; Rossi 
et al. 2010) could be related to convolution-based meth- 
ods. Whereas deconvolution based methods require accurate 
knowledge of p(£|z), the distribution of the photometric red- 
shift £ given the true redshift z, convolution based methods 
require accurate knowledge of p(z\Q. Since £ is derived from 
photometry, this may more generally be written as p(z\c), 
where c is the vector of observed photometric parameters 
which were used to estimate the redshift. In both cases, 
p(z\c) and p{C\z) are calibrated from a sample in which z is 
known, and are then used in a larger sample where z is not 
available. If the smaller training set has the same selection 
limits as the larger dataset (e.g., both have the same mag- 
nitude limit) then both approaches are valid. We illustrated 
our arguments with measurements in the SDSS (Figures [l]- 

EJ. 

We also showed what additional information must be 
output from photometric redshift codes if their results are 
to be used in a convolution-like approach to provide un- 
biased estimates of galaxy scaling relations. In particular, 
we argued that only if the redshift distribution output by a 
photo- z algorithm, £(z\c), has the same shape as p(z\c), can 
the algorithm be said to be unbiased. Only in this case its 
output (available for the full sample) can be used in place of 
p(z\c) (which is typically available for a small subset). The 
safest way to accomplish this is for the training set to be 
a random subsample of the full dataset - and to then tune 
the algorithm so that £(z\c) = p(z\c). If the training set is 
not representative, then care must be taken to ensure that 
£(z\c) does not yield biased results. 

Obtaining spectra is expensive, so the question arises 
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as to whether or not there is a more efficient alternative to 
the random sample approach. For the convolution method, 
which requires p(z\c), the answer is clearly 'yes'. This is be- 
cause some color combinations (e.g. the red sequence) might 
give rise to a narrow p(z\c) distribution, whereas others may 
result in broader distributions. Since it will take fewer ob- 
jects to accurately estimate the shape of a narrow p(z\c) 
distribution than a broad one, observational effort would be 
better placed in obtaining spectra for those objects which 
produce broad p(z\c) distributions. For the deconvolution 
approach, one would like to preferentially target those red- 
shifts z which produce broader p(C,\z) distributions - for 
similar reasons. But, since z is not known until the spectra 
are taken, this cannot be done, so taking a random sample 
of the full dataset is the safest way to proceed. 

Our methods permit accurate measurement of many 
scaling relations for which spectra were previously thought 
to be necessary (e.g. the color-magnitude relation, the size- 
surface brightness relation, the Photometric Fundamental 
Plane), so we hope that our work will permit photomet- 
ric redshift surveys to provide more stringent constraints on 
galaxy formation models at a fraction of the cost of spectro- 
scopic surveys. 
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